Utilizing Domain Knowledge in End-to-End Audio Processing

نویسندگان

  • Tycho Max Sylvester Tax
  • Jose Luis Diez Antich
  • Hendrik Purwins
  • Lars Maaløe
چکیده

End-to-end neural network based approaches to audio modelling are generally outperformed by models trained on high-level data representations. In this paper we present preliminary work that shows the feasibility of training the first layers of a deep convolutional neural network (CNN) model to learn the commonlyused log-scaled mel-spectrogram transformation. Secondly, we demonstrate that upon initializing the first layers of an end-to-end CNN classifier with the learned transformation, convergence and performance on the ESC-50 environmental sound classification dataset are similar to a CNN-based model trained on the highly pre-processed log-scaled mel-spectrogram features.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Structuring and Querying Personalized Audio Using Ontologies1 Ph.D. Thesis Proposal

User-customized information selection and delivery reduces the complexity of the overwhelming amount of information available to end-users. Our approach employs user profiles, data selection, and presentation facilities to deliver customized audio information to end-users. Specifically, we construct a domain-dependent ontology (a collection of key concepts and their inter-relationships) to enab...

متن کامل

End-to-end learning for music audio tagging at scale

The lack of data tends to limit the outcomes of deep learning research – specially, when dealing with end-to-end learning stacks processing raw data such as waveforms. In this study we make use of musical labels annotated for 1.2 million tracks. This large amount of data allows us to unrestrictedly explore different front-end paradigms: from assumption-free models – using waveforms as input wit...

متن کامل

Metadata Tools for Digital Motion Picture Archives

Most of the video information retrieval systems today rely on some set of computationally extracted video and/or audio features, which may be complemented with manually created annotation that is usually either arduous to create or insufficient for capturing the content. This thesis looks at the specific domain of motion pictures to identify the computational features relevant to films and, mor...

متن کامل

Cipher text only attack on speech time scrambling systems using correction of audio spectrogram

Recently permutation multimedia ciphers were broken in a chosen-plaintext scenario. That attack models a very resourceful adversary which may not always be the case. To show insecurity of these ciphers, we present a cipher-text only attack on speech permutation ciphers. We show inherent redundancies of speech can pave the path for a successful cipher-text only attack. To that end, regularities ...

متن کامل

Software-Based Video/Audio Processing for Cellular Phones

Nowadays, most cellular phones are used beyond voice communication. Although the processing power of cellular phones is sufficient for most data applications, it is difficult to play video and audio contents in software because of their computational complexity and lack of basic tools for multimedia processing, so software-based multimedia processing on cellular phones is a challenging issue. S...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1712.00254  شماره 

صفحات  -

تاریخ انتشار 2017